Dynamic Load Balancing in Stream Processing Pipelines Containing Stream-Static Joins

نویسندگان

چکیده

Data stream processing systems are used to continuously run mission-critical applications for real-time monitoring and alerting. These require high throughput low latency process incoming data streams in real time. However, changes the distribution of over time can cause partition skew, which is defined as an unequal partitions among workers, resulting sub-optimal due unbalanced load. This paper presents first solution designed specifically address skew context joining streaming static data. Our uses state-of-the-art principles monitor load, detect load imbalance, dynamically redistribute partitions, achieve optimal balance. To accomplish this, our leverages collocation data, while considering join subsequent operations. Finally, we present results experimental evaluation, compared four pipelines containing such a join. The show that achieved significantly higher lower than competing approaches.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Load Diffusion for Stream Joins

Data stream processing has become increasingly important as many emerging applications call for sophisticated realtime processing over data streams, such as stock trading surveillance, network traffic monitoring, and sensor data analysis. Stream joins are among the most important stream processing operations, which can be used to detect linkages and correlations between different data streams. ...

متن کامل

Static Optimisation vs. Dynamic Evaluation for Data Stream Processing

The work presented in this dissertation offers the quantitive comparison between two different execution frameworks for queries over data streams. The fist framework is the static one. Its optimiser decides the execution plan, and it orders the operators according to it. Then, it schedules the incoming data through these operators. The plan is fixed and it cannot change throughout the processin...

متن کامل

Stream-processing pipelines: processing of streams on multiprocessor architecture

In this paper we study the timing aspects of the operation of stream-processing applications that run on a multiprocessor architecture. Dependencies are derived for the processing and communication times of the processors in such a system. Three cases of real-time constrained operation and four cases of communication organization are considered and compared. Examples of application are given fo...

متن کامل

PMJoin: Optimizing Distributed Multi-way Stream Joins by Stream Partitioning

In emerging data stream applications, data sources are typically distributed. Evaluating multi-join queries over streams from different sources may incur large communication cost. As queries run continuously, the precious bandwidths would be aggressively consumed without careful optimization of operator ordering and placement. In this paper, we focus on the optimization of continuous multi-join...

متن کامل

Approximate Data Stream Joins in Distributed Systems

The emergence of applications producing continuous high-frequency data streams has brought forth a large body of research in the area of distributed stream processing. In presence of high volumes of data, efforts have primarily concentrated on providing approximate aggregate or top-k type results. Scalable solutions for providing answers to window join queries in distributed stream processing s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronics

سال: 2023

ISSN: ['2079-9292']

DOI: https://doi.org/10.3390/electronics12071613